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Abstract 

The categorization properties of an attractor network of three- 
state neurons which infers three-state concepts from examples 
are studied. The evolution equations governing the parallel dy- 
namics at zero temperature for the overlap between the state of 
the network and the examples, the state of the network and the 
concepts as well as the neuron activity are discussed in the limit 
of extreme dilution. A transition from a retrieval region to a 
categorization region is found when the number of examples or 
their correlations are increased. If the pattern activity is small 
enough, the examples (concepts) are very well retrieved (catego- 
rized) for an appropriate choice of the zero-activity threshold of 
the neurons. 
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1. Introduction 



Some years ago a minimal modification of the Hopfield model has been sug- 
gested such that categorization of patterns emerges naturally from an encod- 
ing stage structured in layers |]| . The network is spatially homogenous in the 
sense that the synaptic couplings between all neurons are of the same type, 
but the patterns are hierarchically ordered, meaning that patterns belonging 
to the same group are strongly correlated while patterns sitting in distinct 
groups are only weakly correlated. Related models of this type have been 
examined in 0-0. These models lead to the appearance of stable states 
besides those corresponding to the original patterns, e.g., the ancestors of 
the categories to which those patterns belong. 

Shortly after, a simple Hebbian rule has been proposed in order to study 
the performance of a network in learning an extensive number of ancestor 
patterns in such an hierarchical ordering, given that the learning takes place 
with groups of a finite number of (correlated) patterns situated on a lower 
level of the hierarchical tree 0. In other words the problem of categorizing 
examples (i.e. the correlated patterns) into classes defined by concepts (i.e. 
the ancestors) is studied. It turns out that such a network looses its ability 
to retrieve the examples when a critical number of them is presented during 
the learning stage, but it then gets the ability to categorize the concepts 0- 
0- 

This categorization property ocurring in fact through learning from ex- 
amples is thus a particular kind of generaliztion. Generalization has been a 
topic of intensive research in recent years. (For recent reviews emphasizing 



different aspects see [p|-|T 



Recently, models with multi-state and analogue neurons have been intro- 



duced in the study of categorization problems |L5|, [16fl . By using analogue 
neurons flT5|| , less (binary) examples are needed in order to start categoriza- 
tion. However, the generalization error, i.e., the Hamming distance between 
the microscopic state of the network and the (binary) concepts is larger than 
in the corresponding two-state model. A further improvement is obtained by 
using low-activity examples, from which (binary) full-activity concepts can 



be infered, even if the number of examples is small [16]. This must be due 
to the fact that mixture states of patterns can be inherently stable, allowing 
the network to ultimately form higher activity patterns out of smaller ones, 
just like happens in the retrieval regime for both highly diluted [17| and fully 
connected three-state networks [ 15 ]. 
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In this paper we extend these models by allowing the concepts themselves 
to be three-state. Furthermore, we do not require full symmetry of the 
retrieval overlaps of the examples. At the same time we study the conditions 
characterizing the transition from the retrieval phase to the categorization 
phase. These problems are considered in an asymmetrically diluted network 
with a Hebbian type learning rule because, as is standard knowledge by 
now, its parallel dynamics can be solved exactly fl9fl . The following main 
results are found. When the number of examples (per concept) presented to 
the network is too small or their correlations are too weak, it behaves as a 
retrieval (of examples) device. When one of these parameters attains a large 
enough value the network starts categorizing. The latter behavior can be 
considerably improved by appropriately chosing the zero-activity threshold 
of the neurons. Compared with the categorization properties of the binary 
concept model [nj, we find that the categorization error is smaller and that 
a greater number of concepts can be categorized. 

The rest of this paper is organized as follows. In Section 2 we introduce 
the model and the relevant Hamming distances as macroscopic measures for 
the retrieval and categorization quality of the model. Section 3 solves the 
parallel dynamics leading to evolution equations for the retrieval overlap, 
the categorization overlap and the neuron activity. In Section 4 we study 
the retrieval and categorization phases as a function of the the zero-activity 
threshold of the neurons, the number of concepts, the number of examples 
per concept, their correlations and their activity. Finally, Section 5 presents 
some concluding remarks. 

2. The model 

Consider a network of N three-state neurons. At time t and zero temperature 
the neurons {cr^t} are updated in parallel according to the rule 

Oi,t+i = F e (h iit ), i = l,...,N (1) 
hi, t = J ii a i,ti ( 2 ) 

where h^t is the local field of neuron i at time t. The input-output relation 
Fg is, in general, a monotonous function and will later on be chosen as the 
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three-state step-like function 



pu_ sign(x) if \x\ > 9 

Fd[x) ~ \ if |x| < (3) 

where 6 1 is the zero-activity threshold parameter of the neurons. The synaptic 
couplings Jij are determined through the learning of s three-state examples, 
r^ p G {0, ±l},/i = l,...,p,p = l,...,s, of p three-state concepts, £f G 
{0, ±1}. The examples have zero mean and variance A = 1 / TV J2i{Vi P ) 2 > 
which is a measure for their activity. The concepts £f are chosen to be 
independent identically distributed random variables (i.i.d.r.v.) with mean 
zero and activity equal to the activity A of the examples. The following 
Hebbian-type algorithm is taken 

^•^EE^f- (4) 

Furthermore, each set of examples {rji P } s p= i at site i = 1, N is built from 
the concept £f through the following process 

rf = &W, AT6{±1}. (5) 

The variables Af p are also taken to be i.i.d.r.v. with a bias towards the value 
+1 such that they are given by the probability distribution 

P (AH = M(Ar-i)+^(Ar+i), (6) 

with b± = (1 ± b)/2. The parameter 6 describes the correlation between 
the stored example 7/f p and its concept viz. (T)i P Cj) = bA5ij, and the 
correlation between two different examples of the same concept (??f p ?7j' J ) = 
b 2 A5 ij . 

At this point we remark that, on the one hand we recover the binary 
categorization model M by setting A = 1 and 6 = 0. On the other hand, the 



standard three-state neuron model [17] is obtained by taking the number of 



examples in Eq.([|) to be 1 and the correlation 6=1. 

In order to measure the quality of retrieval of the examples we introduce 
the Hamming distance between the stored example and the microscopic state 



of the network [20] 



D t P = jf EbT - = A- 2Am^ t + Q N , t . (7) 
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This defines the retrieval overlap between the microscopic state of the net- 
work and the pth example of the /xth concept 

<t = ^£^- (8) 

These are normalized order parameters within the interval [—1,1], which 
attain the maximal value m!$ = 1 whenever <Ji = 7/f p (recall Eq.(^)). Fur- 
thermore, also the neuron activity is introduced 

£ k*| 2 • (9) 

i 

The task of categorization is successful when the distance between the 
microscopic state of the network and the concept defined as 

E t = ^ Dtf - — A — 2AMfo + Q N , t (10) 

i 

becomes small after some time t. As explained in the introduction the quan- 
tity E% can be considered as the generalization error in this context. (We 
refer to the refs. (||11||-||14||) for a comparison with definitions of the gener- 
alization error in related contexts of learning from examples.) In obtaining 
the second equality of Eq. |l^ we have used the fact that the activity of the 
concepts and the examples are taken to be equal. Furthermore, the overlap 
between the microscopic state of the network and the concept is defined as 

M kt=TTiZ$<>*- (11) 

i 

We now want to consider an extremely diluted asymmetric version of this 
model in which each neuron is connected, on average, with C other neurons 
through the synaptic couplings (see expression (P) 

MC) = Ct-Jv = |EE (12) 

Here, the Qj 6 {0, 1} are i.i.d.r.v. with probability Pr{Cy = 1} = C/N, C > 
0. These CV, are highly asymmetric such that in the limit of extreme dilution, 
C « logiV, the architecture of the network gets the structure of a directed 
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tree and the neurons are uncorrelated for almost all sites i. This allows an 
exact solution of the parallell dynamics [19| . 



In the next section we discuss this solution by writing down the evolution 
equations for the retrieval overlap, the neuron activity and the categorization 
overlap. 



3. The diluted dynamics 

Because we are interested in both the retrieval and categorization properties 
of the network, we take an initial configuration correlated with only one 
concept meaning that only the retrieval overlaps for the s examples of that 
given concept, say the first one, are macroscopic, i.e., of order 0(1) in the 
thermodynamic limit N —>■ oo. In order to study the retrieval of a particular 
example we single out the component p = 1. We furthermore assume that all 
other components are the same i.e., m]^ t = m]^ t for all p > 1. This property 



of the examples is called quasi-symmetry. It extends former work (e.g., []16|) 
where full symmetry of the examples has been assumed. 

The dynamics of this model is then studied following standard methods 
involving a signal-to-noise analysis (see, e.g., [0|, PD[ , [f2"T|]). At this point we 
recall that it is justified to first dilute the system by taking the limit N — > oo 
and second, in the diluted system, to apply the law of large numbers (LLN) 
and the central limit theorem (CLT) by taking the limit C — > oo. Further- 
more the retrieval overlaps have to be considered over the diluted structure 
and the loading a is defined by p = aC. Finally, we know that because of 
the extremely diluted structure of the network the equations derived for the 
first time step are valid for any time step. 

Splitting the local field (@) into a signal and noise part gives 

s p s N s~i 

Kt = ife u mg t + E vl P mc P t + E E E T^f^ ( 13 ) 



with 



m, 



C.t 



\^ U »J' 1P_ 

h ca' 1 ' 



(14) 



In the thermodynamic limit we then obtain in a standard way (|16], [19 
(((X u F e (h t ))^) Xs )^ t (15) 



20, ET 



m 



ii 
t+i 
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m] s +l = (((x s F e (h t )) x ii) x X t (16) 
M] +l = (((Fe(ht))xn) Xs U (17) 
Q t+l = A((([F,(^)] 2 ) All )^)^ + (l-A)([F e (^)] 2 ) Wt . (18) 



with 



h t = Xfmf + (s - l)x s m\ s + u t (19) 
ut = [arQ t ]V 2 M{0,l). (20) 

Here = indicates that this relation is valid in distribution, x s = Sp>i 
r = s(l + (s — 1)6 4 ) and the quantity A/"(0, 1) is a Gaussian random variable 
with mean zero and variance unity. Furthermore, we have averaged already 
over £ 1 . The brackets denote the further averages over both A 11 , x s and over 
uj t . We recall that the average over A 11 has to be done according to the 
distribution (|5]). 

The first term in the expression (|19|) is the signal coming from the first 
example of the first concept, while the second term, recalling the assumed 
quasi-symmetry of the examples, represents the signal of the other examples 
of the first concept. It has a strength factor x s . The third term is the noise 
caused by the examples of the {p — 1) residual non-condensed concepts. 

These Eqs. (|IT| )-(|l8l) give a complete description of the dynamics for the 
retrieval of examples and the categorization of the concepts by the network we 
are considering for a general monotonous input-output function Fg. Chosing 
Fg to be the three-state function (|3|) we obtain the following explicit forms 
for the dynamics 



s-l 



m lll 



Y,p b (j){b + m(n + ) + * t -(n+)] - b-[n(^-) + *r(n-)]} (2i) 



j=0 



m 



m = I>(j) 2j "t 1 {hin(n + ) + *r(n+)] + M** + (^~) + *r("-)]} 

j=0 S 1 

(22) 

Ml +l = J2 Pb (j)s{b4^t(^) + ^t^ + )}+bA^t(^) + ^t^-)}} (23) 

3=0 

Qt+i = i-A^ Pb (j){b4^t(n + )-^(n + )} + b4^t^-)-%^-)}} 

3=0 

-2(l-A)[*+(0)]. (24) 
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Here 



and 




b^r 1 - 1 , (25) 



*f (Ot) = erf(-^= ), fl ± = (2j - s + l)mf ± mf, (26) 

where erf(x) = Jq dze~ z2 ^ 2 / \/2tx. In the case that we have many examples 
per concept we use for x s defined before the Gaussian approximation x s = 



b + z s y^Y with z s = Af(0, 1) independent of u> t . 

4. Retrieval and categorization 

We now discuss the structure of the retrieval and categorization dynamics 
which can be extracted by numerical solution of the fixed-point equations 



given by fl2lD- ([24j) (by leaving out the t-dependence). 

Besides the zero solution Z determined by m 11 = M l = Q = it is 
necesary to distinguish among the following different types of solution: the 
retrieval solutions R defined by m 11 > M 1 > and Q > 0, the categoriza- 
tion solutions G defined by < m 11 < M 1 and Q > and the self-sustained 
activity |22j solutions S with Q > but m 11 = M 1 = 0. For the retrieval 



respectively categorization solution we impose the further condition D < 0.1 
respectively E < 0.1. Hereby the numbers 0.1 are somewhat arbitrarily cho- 
sen but the idea is to guarantee a minimal retrieval respectively generalization 
quality of the network. 

Since there are many parameters to be considered in the discussion of the 
numerical results we only show in Figs. 1-6 the properties of the network we 
believe to be typical and important. 

Figure 1 shows the Hamming distance D = D^, the generalization error 
E = EIq and the neuron activity Q = Qoo as a function of the correlation 
b. The other parameters are chosen as follows: the number of examples 
s = 5, the loading rate a = 0.01, the activity A takes the values 1 (binary) 
in the upper part of the figure and 0.3 (three-state) in the lower part, and 
the zero-activity threshold 9 is (binary) in the left part of the figure and 
0.5 (three-state) in the right part. For binary patterns it is seen that the use 
of three-state neurons does not affect the overall behaviour of the network. 
However, for three-state patterns both the retrieval and the categorization 
abilities are improved. The transition from a R phase to a G phase is clearly 
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present in all data. For a critical value of the correlation, b c , there is a 
crossing between the D and E lines. Here we remark that in the case of 
three-state patterns (A = 0.3) and binary neurons (9 = 0) the conditions 
for good retrieval and categorization behaviour of the full patterns are, of 
course, not satisfied. But as the curves indicate, e.g., one finds the best 
possible retrieval of the active sites (D = 0.7 for b < b c ). Furthermore, we 
also note the existence of a plateau for D in the case of three-state patterns 
and three-state neurons, where the Hamming distance is not small but it 
still satisfies D < E. Finally, in all cases there exists a minimal value for 
E meaning that the categorization is optimal for the corresponding network 
parameters. This does not always happen for b = 1, the reason being that 
although the neuron activity Q becomes high for large b, the pattern activity 
A may be so small that a,, can not match £j. This is in agreement with 
Eq.(0). 

This behaviour is further illustrated in typical (9, b) and (A, b) phase 
diagrams, Fig. 2 respectively Fig. 3, where the different phases corresponding 
to the solutions of the fixed-point equations described before are shown. We 
have divided the -R-phase in two regions, one of them (R+) refering to the 
region where D is almost zero, the other one (R-) indicating the region where 
D has already jumped to the plateau seen in Fig. 1. The thin dashed line 
G opt in the phase diagrams describes the optimal categorization. We remark 
the existence of two S-phases in Fig. 3, the first one separating the R- and 
G-phase, the second one occuring for large correlations b. The first indicates 
a region where b is already too large to have retrieval but still too small to 
allow categorization. The second describes the high neuron activity region 
for large b mentioned above. 

In Fig. 4 we plot the behaviour of the network as a function of the zero- 
activity threshold 9 for A = 0.01 and s = 80 examples per concept. On 
the left we show both the retrieval overlap m and the overlap with a con- 
cept M for a = 0.02 and small correlations b = 0.1 such that a -R-phase 
exists. For an appropriate choice of the threshold m « 1 while M becomes 
small. Thereby we note that although the concept storage seems to be rather 
small (a = p/C = 0.02), the example storage is large for correlated patterns 
(a s = sa = 1.6). On the right we display M for several values of a and 
large correlations (b = 0.5) such that we are in the G-phase. For increasing 
values of the threshold until 9 = 9 opt (a) the overlap with a concept becomes 
larger indicating that the categorization ability inproves. For 9 > 9 opt (a) this 
categorization ability slowly decreases and at 9 = 9z{a), the overlap M falls 
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abruptly to zero. This illustrates that for a very carefully tuned 6, M rs 1, 
implying that categorization stays succesful even for a concept loading larger 
than a = 3. So compared with the categorization properties of the binary 
concept model (see Fig. 2 of jl6|), we find that by using three-state concepts 
the categorization error is smaller and that a greater number of concepts can 
be categorized. 

Next, an (a, 9) phase diagram is shown in Fig. 5. We take A = 0.01, 
b = 0.5 and a large number of examples s = 80 such that no i?-phase appears. 
Categorization is then possible within the full line boundaries. Again the 
line G op t indicates the values of 9(a) for which M is maximal, thus E is 
minimal and hence the categorization is optimal. In the region between the 
full line and the (thick) dashed line, M is also of order 1, however Q is of 
the same order such that the condition for good categorization, E < 0.1, is 
not satisfied. Above the dashed-dotted line an S-phase does exist. Phase 
coexistence is possible in some regions and precisely what phase is attained 
depends on the initial conditions mj^mj 3 and Q Q . 

Finally, in Fig. 6 we show m and M as a function of 9 for an analogue 
input-output relation Fg = tanh(x/6 l ). For comparison, the same network 
parameters are used as in Fig. 4 for the three-state case and a again takes 
several values. Concerning retrieval of the examples a similar behaviour is 
found as in Fig. 4 with a slightly smaller m. Concerning categorization, 
however, although an analogous non-monotonous behaviour in 9 is seen, M 
does not come close to 1 for larger a. It demonstrates that the gain parameter 
of a continuous input-output relation does not play the role the zero-activity 
threshold does for the three-state case. The reason is that in the three-state 
case this threshold switches of the neurons whose field hi is not large enough 
such that the <7j can match the three-state patterns £j. 

5. Concluding remarks 

We have studied the retrieval and categorization properties of an extremely 
diluted three-state neural network through the solution of its parallel dynam- 
ics. In comparison with existing models in the literature the concepts are 
allowed to be three-state and the retrieval overlaps between the examples and 
the microscopic state of the network are not assumed to be fully symmetric. 
We find that the important parameters governing the transition from the 
retrieval to the categorization phase are the number of examples per concept 
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and their correlations. By chosing appropriately the zero-activity threshold 
of the neurons categorization can be considerably improved. In particular, 
in comparison with models for binary concepts the categorization error is 
smaller and a much greater number of concepts can be categorized. 
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Figure 1: The Hamming distance D (dashed line), the categorization error 
E (full line) and the neuron activity Q (thin dashed-dotted line) as a function 
of the correlation b. The other network parameters are chosen as follows: the 
number of examples s = 5, the loading rate a = 0.01, the activity A = 1 in 
the upper part and A = 0.3 in the lower part, and the zero-activity threshold 
6 = in the left part and 9 = 0.5 in the right part. 
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Figure 2: The (9, b) phase diagram with A = 0.1, s = 5 and a = 0.01. The 
following phases occur: the retrieval phases R + and the categorization 
phase G, the self- sustained activity phase S and the phase Z corresponding 
with the fixed-point zero. The thin dashed line G opt indicates optimal catego- 
rization. 
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Figure 3: The (A, b) phase diagram with 9 = 0.5, s = 5 and a 
lines are as in Fig. 2. 
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Figure 4: Left : The overlaps m (full line) and M (dashed line) as a 
function of 9 for A = 0.01, s = 80, b = 0.1 and a = 0.02. Right : The 
overlap M as a function of 9 for A = 0.01, s = 80, b = 0.5 and a = 0.5 
(dashed- dotted line), a — 1 (dashed line), a = 2 (dotted line) and a = 3 (full 
line). 
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4.0 




Figure 5: The (0,a) phase diagram with A = 0.01, s = 80 and b = 0.5. 
Inside the full line the G-phase exists. Between the (thick) dashed line and 
the full line the condition E < 0.1 is not satisfied. Below the dashed-dotted 
line there exist no S-phase. The line G opt is as in Fig. 2. 
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Figure 6: Left : The overlaps m (full line) and M (dashed line) as a function 
of 6 for analogue neurons with A = 0.01, s = 80, b = 0.5 and a = 0.004. 
Right : The overlap M as a function of 6 for A = 0.01, s = 80, b = 0.5 and 
a = 0.1 (dashed- dotted line), a = 0.3 (dashed line), a = 0.5 (dotted line) 
and a = 0.7 (full line). 
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